Probability theory

part I

Eva Freyhult

NBIS, SciLifeLab

2022-09-12

Probability

Probability describes how likely an event is to happen.

Probability

Probability describes how likely an event is to happen.

  1. \(0 \leq P(E) \leq 1\)

A probability is always between 0 and 1, where 1 means that the event always happens, and 0 that it never happens.

Probability

Probability describes how likely an event is to happen.

  1. \(0 \leq P(E) \leq 1\)
  2. \(P(S) = 1\)

The total probability of all possible events is always 1.

The sample space, \(S\), is a set of all possible events.

Probability

Probability describes how likely an event is to happen.

  1. \(0 \leq P(E) \leq 1\)
  2. \(P(S) = 1\)
  3. If \(E\), \(F\) are disjoint events, then \(P(E \cup F) = P(E) + P(F)\)

The probability of two disjoint (non overlapping) events, is the sum of the probability of each event separately.

Probability

Probability describes how likely an event is to happen.

Axioms of probability

  1. \(0 \leq P(E) \leq 1\)
  2. \(P(S) = 1\)
  3. If \(E\), \(F\) are disjoint events, then \(P(E \cup F) = P(E) + P(F)\)

The complement, \(E'\), of \(E\) is all outcomes in \(S\) not in \(E\). \(P(E') = 1 - P(E)\)

Conditional probability

Let \(E,F \subseteq S\) be two events that \(P(E)>0\) then the conditional probability of \(F\) given that \(E\) occurs is defined to be: \[P(F|E) = \frac{P(E\cap F)}{P(E)}\]

Product rule follows conditional probability: let \(E,F \subseteq S\) be events such that \(P(E)>0\) then: \[P(E \cap F) = P(F|E)P(E)\]

Random variables

A random variable describes the outcome of a random experiment.

  • The weight of a random newborn baby, \(W\), \(P(W>4.0kg)\)
  • The smoking status of a random mother, \(S\), \(P(S=1)\)
  • The hemoglobin concentration in blood, \(Hb\), \(P(Hb<125 g/L)\)
  • The number of mutations in a gene, \(M\)
  • BMI of a random man, \(B\)
  • Weight status of a random man (underweight, normal weight, overweight, obese), \(W\)
  • The result of throwing a die, \(X\)

Random variables

A random variable can not be predicted exactly, but the probability of all possible outcomes can be described.

Random variables: \(X, Y, Z, \dots\), in general denoted by a capital letter.

Probability: \(P(X=5)\), \(P(Z>0.34)\), \[P(W \geq 3.5 | S = 1)\]

Observations of the random variable, \(x, y, z, \dots\)

The sample space is the collection of all possible observation values.

The population is the collection of all possible observations.

A sample is a subset of the population.

The urn model

 

 

By drawing balls from the urn with (or without) replacement probabilities and other properties of the model can be inferred.

Discrete random variables

A discrete random number has countable number of outcome values.

{1,2,3,4,5,6}; {red, blue, green}; {tiny, small, average, large, huge} or all integers.

A discrete random variable can be described by its probability mass function, pmf.

The probability that the random variable, \(X\), takes the value \(x\) is denoted \(P(X=x) = p(x)\).

Remember that:

  1. \(0 \leq p(x) \leq 1\), a probability is always between 0 and 1.
  2. \(\sum p(x) = 1\), the sum over all possible outcomes is 1.

Example: a fair six-sided die

 

 

Possible outcomes: \(\{1, 2, 3, 4, 5, 6\}\)

Example: a fair six-sided die

The probability mass function;

x 1 2 3 4 5 6
p(x) 0.167 0.167 0.167 0.167 0.167 0.167

Example: Smoking status

The random variable has two possible outcomes; non-smoker (0) and smoker (1). The probability of a random mother being a smoker is 0.39.

non-smoker smoker
x 0 1
p(x) 0.61 0.39

Example: Number of bacterial colonies

Expected value

When the probability mass function is know, the expected value of the random variable, the population mean, can be computed.

For a uniform distribution, where every object has the same probability;

\[E[X] = \mu = \frac{1}{N}\sum_{i=1}^N x_i\]

In general,

\[E[X] = \mu = \sum_{i=1}^N x_i p(x_i)\]

Expected value

Linear transformations and combinations

\[E(aX) = a E(X)\] \[E(X + Y) = E(X) + E(Y)\]

\[E[aX + bY] = aE[X] + bE[Y]\]

Variance

The variance is a measure of spread and is defined as the expected value of the squared distance from the population mean;

\[var(X) = \sigma^2 = E[(X-\mu)^2] = \sum_{i=1}^N (x_i-\mu)^2 p(x_i)\]

Variance

Linear transformations and combinations

\[var(aX) = a^2 var(X)\]

For independent random variables X and Y

\[var(aX + bY) = a^2var(X) + b^2var(Y)\] ::: {.notes} X and Y are independent if \(p(X|Y)=P(X)\) and/or \(p(X \cap Y)=P(X)*P(Y)\)

Simulate distributions

Once the distribution is known, we can compute probabilities, such as \(P(X=a), P(X<a)\) and \(P(X \geq a)\).

If the distribution is not known, simulation might be the solution.

Simulate distributions

In a single coin toss the probabity of heads is 0.5.

In 20 coin tosses, what is the probability of at least 15 heads?

The outcome of a single coin toss is a random variable, \(X\), that can be described using an urn model.

Simulation in R!

Bernoulli trial

A Bernoulli trial is a random experiment with two outcomes; success (1) and failure (0).

The outcome of a Bernoulli trial is a discrete random variable, \(X\).

The probability of success is constant, \(P(X=1) = p\).

It follows that the probabaility of failure is \(P(X=0) = 1 - p\).

Using the definitions of expected value and variance it can be shown that;

\[E[X] = p\\ var(X) = p(1-p)\]

Binomial distribution

The number of successes in a series of \(n\) independent and identical Bernoulli trials is a discrete random variable, \(X\).

\(X = \sum_{i=0}^n Z_i,\)

where all \(Z_i\) describe the outcome of independent and identical Bernoulli trials with probability \(p\) for success.

The probability mass function of \(X\), called the binomial distribution, is

\[P(X=k) = {n \choose k} p^k (1-p)^{n-k}\]

\[E[X] = np\\ var(X) = np(1-p)\]

In R: pbinom to compute \(P(X \leq k)\) and dbinom to compute the pmf \(P(X=k)\).

Hypergeometric distribution

The hypergeometric distribution describe the number of successes in a series of \(n\) draws without replacement, from a population of size \(N\) with \(Np\) objects of interest (successes).

The probability density function

\[P(X=k) = \frac{{Np \choose k}{N-Np \choose n-k}}{N \choose n}\] In R: phyper to compute \(P(X \leq k)\) and dhyper to compute the pmf \(P(X=k)\).

Poisson distribution

The Poisson distribution describes the number of times a rare event occurs in a large number of trials.

Poisson distribution

The probability mass function;

\[P(X=k) = \frac{\mu}{k!}e^{-\mu},\] where \(\mu\) is the expected value, which is \(\mu = n \pi\), where \(n\) is the number of objects sampled from the population and \(\pi\) is the probability of a single object.

The Poisson distribution can approximate the binomial distribution if \(n\) is large and \(\pi\) is small, \(n>10, \pi < 0.1\).

In R: ppois to compute \(P(X \leq k)\) and dpois to compute the pmf \(P(X=k)\).

Poisson distribution

Example

A rare disease has a very low probability for a single individual. The number of individuals in a large population that catch the disease in a certain time period can be modelled using the Poisson distribution.